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Abstract 

This paper presents a probabilistic analysis of plausible reasoning about defaults and about likelihood. 
“Likely” and “by default” are in fact treated as duals in the same sense as “possibility” and “necessity”. 
To model these four forms probabilistically, a logic QDP and its quantitative counterpart DP are derived 
that allow qualitative and corresponding quantitative reasoning. Consistency and consequence results 
for subsets of the logics are given that require at most a quadratic number of satisfiability tests in 
the underlying propositional logic. The quantitative logic shows how to track the propagation error 
inherent in these reasoning forms. The methodology and sound framework of the system highlights their 
approximate nature, the dualities, and the need for complementary reasoning about relevance. 

Index Terms: default, likelihood, plausible reasoning, qualitative reasoning, subjective probability. 
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1 Introduction 

Default reasoning is a form of non-monotonic reasoning which cam be introduced by Delgrande [1] as follows: 


Many common sense assertions about the real world express default or prototypical properties of 
individuals or classes of individuals, rather than strict conditional relations. Thus, for example, 

“birds fly” attributes the property of flight to birds, even though birds with broken wings generally 
don’t fly, and quite probably no penguin flies. The import of “birds fly” then certainly isn’t that 
all birds fly, but rather is more along the lines of “typically birds fly”. 

This form of default reasoning then is concerned with drawing “typical” conclusions. There is a continuously 
growing and diverging variety of theoretical treatments on this and other forms of non-monotonic reasoning 
[2, 3, 1,4, 5, 6, 7, 8]. 

Likelihood reasoning, another form of plausible reasoning, is more concerned with drawing “likely” con- 
clusions. For example, it is “likely” or reasonably possible that a coin tossed twice will land heads both tunes, 
although this certainly is not “typically” the case. It is not “likely”, however, that a coin tossed twice will 
land on its side one of those times. Although the laws of physics might treat this as a “possible” outcome, 
for most practical purposes it is not. When one is considering possible outcomes, rather than looking at all , 
likelihood reasoning is intended to be applied to find only those outcomes that are reasonably possible. A 
historical perspective and further discussion for this form of reasoning can be found in [9]. 

The relationship between probability and plausible reasoning is best introduced by Polya [10, Chapter 
XV] in his work on reasoning in mathematics. Polya introduced a system of guides to the mathematician of 
the form: 


[Given a conjecture,] the verification of a consequence renders the conjecture more credible. 

Our confidence in a conjecture can only increase when an incompatible rival conjecture has been 
exploded. 

These guides were based on belief about conjectures modelled as subjective probabilities. Plausible reasoning 
about default and likelihood, however, has more often been modelled in AI using purely logical formalisms 
[5, 1,6,9] or non-standard probabilistic methods [11,12,13], although probability-motivated approaches exist 
[7,14]. Another form of reasoning seen in areas such as qualitative physics and model-based diagnosis 
systems is the qualitative and approximate reasoning about physical devises. In this paper we combine the 
two paradigms, probabilistic and qualitative/approximate, to model default and likelihood reasoning, and 
so take up Polya’s theme more fully. 

Surprising to some, it is controversial whether these plausible reasoning forms can be modelled with 
probabilities 1 [15,4,16]. Likelihood reasoning and some forms of default reasoning, however, will always 
remain problems of uncertainty or incomplete information. With some forms of non-monotonic reasoning, 
such as the closed-world assumption used in database systems and PROLOG, uncertainty does not exist 
because the default is actually a convention. These exceptions aside, there comes a time when something 
that is currently “typically” or “likely” to hold becomes known true or false. Until that time, we are in a 
state of uncertainty. However well logical systems may cope with modelling these reasoning forms, we should 
at least see how they can be modelled by a theory of uncertainty like probability. Perhaps there is more to 
learn? 


1 Cheeseman has said [15, pl002]: 

Unfort una tely, the logical style of reasoning is so prevalent in AI that many have attempted to force int r i nsica l l y 
probabilistic situations into a logical straight jacket with predictable limited success. 
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This paper follows the view that subjective Bayesian probability theory provides a benchmark against 
which methods for reasoning about uncertainty can be compared. The theory is a normative theory of 
reasoning about uncertainty, which means it gives a prescription for how uncertain reasoning should be done. 
The prescription itself has been derived from a set of fundamental axioms about belief (an introduction to 
this in the AI context is in [17]). One can model default and likelihood reasoning as either qualitative or 
quantitative approximations to full normative probabilistic reasoning. One can then argue that the resulting 
model seems to exhibit the required properties, and compare the model with some existing methods. 

A logic QDP (a mnemonic for qualitative default probabilistic logic) is developed here from a suitable 
quantitative counterpart DP as a demonstration. This yields a probabilistic system as a canvas on which a 
number of more significant issues can be sketched. These issues are: (1) the interplay between quantitative 
and qualitative forms of plausible reasoning, (2) the duality between default and likelihood reasoning, (3) 
the approximate nature of these reasoning forms, for instance, the propagation of errors in reasoning, and 
(4) the need for complementary reasoning about, for instance, relevance. 

The logic QDP , being probabilistically based, is easily able to express sentences 2 such as “most birds 
fly”. This is using a “default” conditional style operator “=^-” as in: Bird(x) => Flies(x). Similarly, an 
Australian is likely to drink Foster’s” 3 can be represented with a “likely” conditional style operator “«►" as in: 
Australian Drinks- Foster's. This operator also has iterated forms indicated by numeric superscripts, 

“r^ 2 ”, ^at express lesser degrees of likelihood, as in: Australian Drink s-another- F oster s, which 
expresses the fact that, at least occasionly, an Australian will drink even more Foster’s. 

Surprisingly enough, QDP is also able to express sentences more in the spirit of autoepistemic [18] and 
default logics [2]. We can interpret the sentence “a professor has a Ph.D. unless known otherwise” two ways: 

o(Prof(x) A Phd(x)) — *• ( Prof(x ) => Phd(x )) , 
o {Prof(x) A Phd{x )) — * □ (Prof{x) — Phd{x)) , 

where the operator represents necessity interpreted as “known with certainty” , and the dual “o” operator 

represents possibility interpreted as “the negation is not known with certainty”. Read as if it is possible 
that a particular professor has a PhD, then the professor most likely has a Ph.D”, and “if it is possible 
that a particular professor has a PhD, then the professor definitely has a Ph.D ” respectively. The default 
logic representation, from Prof(x) A M Phd{x) infer Phd{x), corresponds to the second readmg. So the 
possibility operator, “o”, behaves rather like the M operator of default logic. 

The default component of the logic QDP is a variant and extension of Adams’ conditional logic [19], 
applied to default reasoning by Pearl [7]. The probabilistic semantics of QDP differs slightly from Adams’ 
logic however, because QDP is developed as a qualitative model for order of magnitude reasoning about 
probabilities, rather than being based on infinitesimal arguments. Like Adams’ logic, QDP can be combined 
with a notion of relevance or causality to resolve the so-called default paradoxes: the Yale shooting problem 
[8] and “can Joe read and write?” [7]. The logics also resolves the “vanishing subclasses” paradox [20]. 
These three paradoxes are discussed in Section 5. A fourth paradox is the lottery paradox [3], considered 
in Section 3. This has a version both in default and likelihood reasoning, and provides an example of the 

propagation of errors inherent in these reasoning forms. 

The logics has been modelled after Delgrande’s modal conditional logic NP which allowed reasoning 
about default rules. Likewise, reasoning about defaults and likelihood is an important feature of the approach 
here. For example, suppose you know friends have travelled to Australia. Then they are likely to have visited 
Sydney. Although any visitor to Sydney will typically see the Sydney Harbour Bridge, it is only likely that 
they will visit Bondi Beach. We can infer that your friend is likely to (rather than “typically”) have seen 
the Harbour Bridge but is less likely to have visited Bondi. In QDP , this argument can be summed up as 


2 Although propositional sentences are dealt with throughout, pseudo- first-order sentences will sometimes be used. They are 
effectively propositional if there are known to be a finite number of constants, no quantifiers are allowed, and a sentence with 
variables is intended to represent a sentence schema. 

3 For the record, many Australians don’t. Some drink XXXX, others Swan, .... 
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follows: 


true as Visit- Sydney {Bruce) 

Visit- Sydney(x) => See- H arbour- Bridge(x) 

Visit-Sydney(x) as Visit- Bondi(x) 

\=qdp {true as See- H arbour- Bridge{Bruce)) A {true as 2 Visit- Bondi{Bruce)) . 

Consistency and consequence tests developed for subsets of the default and likelihood components of the 
logics also show how this form of reasoning can be automated in a manner requiring at most a quadratic 
number of satisfiability tests in the underlying propositional logic. With a careful choice of the underlying 
propositioned logic, the operation can then be quite efficient. 

Perhaps most significantly, this reasoning can be easily complemented with error tracking facilities to 
indicate when the conclusions from a chain of such plausible reasoning may be coming doubtful. For instance, 
it is shown in some circumstances that error when reasoning about defaults can increase at most additively, 
while error when reasoning about likelihood can increase multiplicatively. It is not claimed, however, that 
these tracking facilities are a substitute for a more thorough probabilistic approach; they are merely an 
approximation. 

The paper follows the following course. First, the philosophical problem of modelling default reasoning 
with probabilities is considered in Section 2. The corresponding discussion for likelihood reasoning is not 
given here, because the principle objections in AI to modelling likelihood reasoning with probabilities do 
not centre around the use of probability theory at all, but whether the modelling should be qualitative or 
quantitative [9], and both are done here. A basic probabilistic framework for plausible reasoning is then 
proposed in Section 3. Two logics, one with a probabilistic semantics, £>P, and a qualitative version, QDP , 
are then introduced in Section 4. Here, the duality between default and likelihood is introduced, and the 
consistency and consequence results are developed. Section 5 demonstrates a methodology for applying the 
qualitative logic, using relevance, and Section 6 draws some comparisons with other probabilistic approaches. 


2 On Modelling Default Reasoning with Subjective Bayesian 
Probability 

Non-monotonic reasoning is generally considered to have three broad forms [4,18]: autoepistemic reasoning is 
reasoning about self-knowledge of beliefs [18], for instance, “if I had an older brother I would know about it”; 
conventions are used in the interpretation of natural language and with the closed-world assumption often 
made for database systems; and typicality or default reasoning is the form discussed in the Introduction. 

To illustrate the use of convention in natural language, consider the sentence ”Birds lay eggs” [16], which 
is certainly not true for the male half of the bird population. The sentence is more accurately stated as 
“[Female] birds lay eggs [to reproduce]”. The parts in the square brackets are implicit. Most people realise 
that male birds cannot lay eggs, so in the interests of brevity, the speaker leaves “female” to be inferred 
from the remainder of the sentence. This implicit convention is handled in nonmonotonic systems using 
knowledge of the form “an X is a Y unless known otherwise”. As illustrated in the introduction, this form 
can also be represented in a probabilistic framework using the probabilistic version of the possibility and 
necessity operators. 

When modelling the third form, typicality or default reasoning, we are hampered by the fact that there 
is little consensus as to its exact nature [20]. Hanks and McDermott [8] say, 

While it is not entirely clear exactly what constitutes default reasoning, the phenomenon com- 
monly manifests itself when we know what conclusions should be drawn about typical situations 
or objects, . . . 

Neufeld, Poole and Aleliunas [20] make an even stronger statement. They say, 
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What, then, does a default mean? Within the default logic camp, we know of no work which 
provides a semantics for defaults, in the sense that an experiment is described that can be 
performed in the semantic domain to verify the truth of a default. 

However, there is general agreement that default reasoning is a form of “defeasible inference”, or “plausible 
reasoning” [21,8], and that default conclusions have some (often small [21]) degree of uncertainty to them. 

Given that default reasoning is an admittedly specialised form of reasoning under uncertainty, it is 
natural to pose the question: can probability theory model default reasoning? (See also [7].) Critics of a 
Bayesian approach claim that probabilities are just not suited for describing “prototypical” knowledge. Most 
arguments, however, are based on some misunderstanding. 

Nutter [16] gives the following argument: 


For instance: if . . . the by now tormented example “Birds fly” really means “Most birds fly”, 
then birds don’t fly in spring. In the nesting season, baby birds outnumber adults. Baby birds 
don’t fly. Hence in the nesting season, “Most birds fly” is false. 

To the Bayesian, “Most birds fly” is interpreted as “if we know nothing else about a particular bird, then 
that bird most likely flies”. Notice the “most likely” conclusion is conditioned on our current knowledge 
about the bird. In particular, if we know it is nesting season, we cannot conclude the bird most likely flies 
because we do now know some additional thing about the bird. Two rules are relevant to the situations 
Nutter gives: “Most birds fly” and “In the nesting season, most birds don’t fly”. If we do not know that 
it is the nesting season, then the first rule is applicable because it usually is not the nesting season. The 
importance of conditioning probabilistic statements with context or current knowledge is a key feature of 
probabilistic reasoning and the cornerstone of the subjective Bayesian approach. 

McCarthy address a similar concern [4, p92]. 


Note that the general probability that a bird can fly may be irrelevant, because we are interested 
in the facts that influence our opinion about whether a particular bird can fly in a particular 
situation. 

Classical statistics, with its concern about long term frequencies and samples spaces, can have problems in 
adapting general knowledge to specific situations. The ability to adapt knowledge to particular situations, 
however, is a hallmark of Bayesian methods. In this case, suppose we know that the bird is a male yellow- 
bellied warbler, but we have no knowledge at all about this type of bird, or even what they may be similar 
to. The only relevant knowledge we have is the general probability statement that most birds fly. In the 
absence of information to the contrary, we assume that other details about the bird are irrelevant (this 
is the maximum entropy argument [7]), which leads us to the quite reasonable conclusion that most male 
yellow-bellied warblers fly. We can now reason about this particular bird. 

There are, however, strong arguments that default reasoning should be modelled by probability vrith 
caution. In practice, an intelligent system may not be able to supply precise probabilities for its beliefs 
and may not be able to perform all the exact calculations required to maintain its beliefs in accord with 
Bayesian principles as new evidence becomes available. People certainly cannot. It is of course not just the 
computation that causes problems but the communication required to prime and then update an intelligent 
system with an adequate set of beliefs. 

The normative properties of Bayesian theory assures us that despite these problems, by trying to approx- 
imate the Bayesian approach our reasoning at least remains approximately rational. Essentially, it is the best 
we can do in an inherently imprecise and computationally complex world. This view has been supported in 
AI alone in a range of areas [22,23,24,25]. 
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3 A Framework for Plausible .Reasoning 

In this section, a basic framework for default and likelihood reasoning is developed. These two forms of 
reasoning are referred to below as plausible reasoning. Before presenting the framework, we first consider 
some major features of plausible reasoning, and then infer properties that a plausible reasoning system should 
have. 


3.1 Basic features of plausible reasoning 

There are several basic features of plausible reasoning that must effect the design of a plausible reasoning 
system. While these can be derived from the probabilistic model presented in the next section, the features 
are presented here independently of any probabilistic analysis. 

Plausible reasoning is non-monotonic 

With standard logical reasoning, conclusions derivable from a set of sentences increase monotonically as the 
set of sentences is extended. That is, if 5 logically implies C, and we extend S with A , then 5 A A also 
logically implies C. 

Default reasoning is known to be non-monotonic [3]; the above monotonicity property breaks down. So 
while you might well believe that birds fly, on discovering that a certain bird is a baby bird in nesting 
season, you would no longer believe that particular bird flies. So your set of beliefs have extended one 
way but contracted another. Similarly, something that, initially seems likely can become, with changing 
circumstances, well nigh impossible. 

Error combines along a chain of plausible reasoning 

A second key feature of standard logical reasoning is that if the premises are known to be true, then the 
conclusion from a long chain of reasoning steps must also be true. With plausible reasoning, however, there 
is an inherent element of uncertainty involved, so it is natural to suspect this key feature might break down. 

The famous lottery paradox [3] is an excellent example of this. For a single lottery entrant, Leslie say, 
one can conclude by default that Leslie will not win the lottery. But we can apply this sort of reasoning to 
every potential lottery entrant. There sure two paradoxes here. First, why is it that someone actually wins 
the lottery. Second, why does Leslie bother to enter the lottery in the first place. 

For a lottery with one million entrants, the default conclusion about Leslie has an obvious statistical error 
of one ten-thousandth of 1%, acceptable by most standards. If we make a logical deduction based on one 
million such default conclusions, the one million errors certainly combine to give a total error of 100% (after 
all, someone definitely wins the lottery). That Leslie would enter the lottery at all is as much irrational 
behaviour due to the effect of large sums of money, as it is the result of plausible reasoning. Perhaps it 
is because most people do not mind losing one dollar just to be given the remotest chance of winning one 
million dollars. In the former, their life is no different; in the latter, well . . . 

This last point anticipates the next basic feature of plausible reasoning. 

Plausible reasoning is effected by the decision context 

After a system performs plausible reasoning, it would typically decide some course of action. As a result of 
the action, the system might make some gain or incur some loss. For Leslie in the lottery situation above 
the potential loss is one dollar while the potential gain is one million minus one dollars. This feature of 
reasoning is referred to as the decision context and the losses and gains as the utilities. 

Shoham provides the following illustration of how the decision context can effect plausible reasoning. 

. . . think of making the default inference “people you’ll meet on the street will not stab you in the 
back” in a city in which only 5% of the population are back stabbers. In this case the relatively 
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s mall chance of being hurt seems to outweigh the computational resources needed to reason about 
individual people on the street, and the discomfort of wearing a steel-plated vest. Notice that if 
the 5% dropped to 0.00000000005%, we’d take off the armor and stop looking darkly at passers 
by. 

Clearly, the decision context should be taken into consideration (see also [26]). 

3.2 Basic properties of a plausible reasoning system 

The above features can be used to argue that a method for plausible reasoning should have certain basic 
properties. 

A first property is that plausible reasoning needs to be sensitive both to the current knowledge of the 
system and to the decision context. This is directly suggested by the features given in the previous subsection. 
Sensitivity to the decision context can be handled by targeting a default system for a single decision context. 

Now the number of different states of knowledge is potentially exponential in the number of propositional 
symbols. So a system could not reasonably keep separate default rules for each possible state of knowledge 
and decision context. To get around this problem, a second property seems important: it should be possible 
to reason about plausible rules and the relevance of different facts to the applicability of a plausible rule. 
It may also be useful to give a system the ability to compile plausible rules from some more fundamental 
knowledge form. 

Third, because of non-monotonicity and error propagation, plausible conclusions need to be flagged as 
such , and should not be confused with the current knowledge. In fact, because of the possible need for 
weig hin g up belief when combining error or considering the decision context, plausible conclusions may need 
to be tagged with some form of qualitative or quantitative measure of belief. Whether this is done and how 
surely depends on the application concerned; no single approach will be favoured in this paper. 

3.3 A probabilistic framework 

It is beyond the scope of this paper to cover the basic notions of probability and decision theory underlying 
subsequent sections. Suitable introductions from an AI perspective can be found in [26,27,7]. The problem 
of the decision context in plausible reasoning is side-stepped here by assuming that a default system is being 
prepared for a specific binary (yes/no) decision. In this simple case, a decision has to be made whether some 
condition, A say, is “true” or “false”. Once utilities of the problem are taken into account, the problem 
invariably reduces to “is Pr(A) > p?” for some p 6 [0, 1]. Given a particular decision context for a binary 
decision, we cam therefore use approximate inequality reasoning to make decisions in a normative manner. 

The notion of probability used here is subjective probability , which is a measure of belief prescribed to 
some proposition by an intelligent system. This is represented as Pr(A\B) £ [0, 1], interpreted as follows: 
a particular intelligent system, on knowing just B , has a measure of belief Pr(A|P) in A being true. The 
“|” operator is called the conditioning operator. Its left hand side is the proposition whose belief is being 
considered and its right hand side specifies all current knowledge relevant to A of the intelligent system. A 
probability distribution is a particular function Pr consistent with the standard axioms of probability theory. 

A probabilistic framework for plausible reasoning is based on the assumptions that (1) plausible state- 
ments that are uncertain should be interpreted in some way using subjective probability statements, and 
that (2) methods of plausible reasoning which deal with uncertainty should be interpreted as approximations 
to subjective probability or decision theory. We shall treat a default conclusion as a plausible proposition in 
which one has “sufficiently high belief\ Similarly, a likely conclusion is a plausible proposition in which one 
has “belief that it is reasonably possible”. In both cases, the belief is modelled as subjective probability and 
should be conditioned on current knowledge using the conditioning operator. Due to the decision theoretic 
argument above, both these types of plausible reasoning should, in many cases, be a good approximation to 
the normative probabilistic approach. 

Notice that this rough probabilistic interpretation of defaults and likelihood automatically provides a 
framework which addresses the basic properties of plausible reasoning discussed in this section. Decision 
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theory provides the basis for considering the decision context. The conditioning operator provides the mech- 
anism for making plausible conclusions sensitive to a system’s current knowledge and for keeping plausible 
conclusions (on the left hand side) separate from current knowledge (on the right hand side). Probability 
theory also provides the potential for developing ways of reasoning about plausible rules, and with the notion 
of independence, ways of reasoning about relevance. Some of these connections are explored more fully in 
the next section. Finally, probability theory provides a framework for both testing and developing default 
rules for a given application, for instance, by learning them from examples. 


4 Default Probabilistic Logic 

This section introduces two logics for default and likelihood reasoning: a probabilistic logic DP and its 
qualitative counterpart QDP. These are applicable in the broad framework given in Section 3 for reasoning 
about defaults and likelihood. Notation and semantics of these logics are first covered in Sections 4.2 and 4.3. 
Some basic properties of the logics sure then outUned. One theme of this paper is the importance of reasoning 
about relevance; Section 4.5 motivates this and shows how relevance information can interface with default 
and likelihood reasoning. Another theme of the paper is the approximate nature of both these reasoning 
forms; Section 4.6 shows how, for small errors at least, the quantitative logic DP can be treated as a simple 
numeric extension of the qualitative logic QDP. This last section presents consistency and consequence 
results for fragments of both logics. 

4.1 Introduction 

DP is a propositional logic annotated with probability bounds, and has a probabilistic rather than a possible 
world semantics. This allows the sort of inequality reasoning found in Quinlan’s INFERNO [28]. Inequality 
reasoning is an approximation to normative reasoning about point probabilities when a decision is binary, 
as explained in Section 3.3. So the justification for DP is approximation, rather than some fundamental 
principle about intervals or fuzzy sets for reasoning under uncertainty. In this sense, it differs in philosophy 
from Ginsberg’s suggestion [12] or Dubois and Prade’s treatment of syllogism’s [13]. 

QDP has the annotations dropped, and the default component is almost identical to Geffner and Pearl’s 
logic of defaults [7,29] borrowed from Adams’ logic of conditionals [30,19]. QDP is also similar to Delgrande’s 
conditional logic NP [1], 

QDP is designed to be a qualitative counterpart of DP. It is intended to be an approximation to DP for 
reasoning about “small” but not infinitesimal probabilities. The semantics of QDP complements DP and 
is based on order of magnitude reasoning. Like NP, dynamic aspects of plausible reasoning (for instance, 
involving action and time) are not handled directly by either DP or QDP, although they can often be 
handled with a simple situation calculus, as is done in Section 5.3. In the general case, an extension of the 
logic would be required. 

4.2 Basic notation 

A standard propositional language denoted Lp is used here. This is formed in the usual manner from a 
finite set of atomic propositions P = {pi,...,Pn} together with true and false, the standard connectives, 
-i (negation), — » (conditional), A (conjunction), V (disjunction) and ♦-* (biconditional). “[= A” denotes that 
propositional formula A is a theorem of the usual propositional logic. 

Probability distributions can be given over the language Lp as follows. An event space Ep, a mutually 
exclusive and exhaustive set of events, is readily constructed from a subset of Lp. Given n atomic propositions 
P as described above, this would have cardinality 2 n and one such set is given by 

Ep = { Li A ... A L n j for i — 1, . . . , n, Li = p, or -^pi } . (1) 
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A probability distribution Pr : Ep *-* [0, 1] maps events to measures of belief. For A,B £ Lp 


Pt(A) 


Pr(B\A) 


X Pr ( e ) ’ 

e£Ep 

j=e—*A 

f ifPr(^)>0 

] 1 otherwise 


In many probability texts, if Pr(A) = 0 then Pr(B\A) is undefined. Instead we assert that if Pr(A) — 0 
then Pr(D|A) = 1. This means we can reason about conditional probabilities even if the antecedent of the 
conditioning is false. A probability distribution like Pr above is termed a distribution over Lp. 

The probabilistic logic DP describes constraints on probability distributions over the language Lp. It 
is built on the language Dp that is constructed from Lp together with four modal operators: the unary 
connectives □ (necessity), o (possibility), and the binary connectives => (default with error bound) and 
(likelihood with lower bound). There is no nesting of these operators. Nesting would represent second and 
higher-order probability statements [31], as used in learning to reason about belief in probabilistic models 
[25], but is unnecessary for the initial treatment here. The operators can be interpreted as follows. 


□A: A is necessarily true in any situation. 


o A: Some situation can possibly arise in which A is true. 

A =>« B: Given that you know just A about the current situation, it is safe to infer B by default (with 

error in belief at most e). 

A J3: Given that you know just A about the current situation, B is at least likely (with belief no less 
than c). 

In the language QD P the subscripts are dropped. QD P also has successively weaker forms of the likelihood 
operator. A B denotes “likely”, whereas A t$- 2 B would denote “barely likely”, etc. This is related to 
the iterated likelihood operator found in [14]. 

A B: Given that you know just A about the current situation, B is at least likely to be ... to be likely 
(to order n). 

Doth the likelihood and default operators are conditional operators, in a similar sense to [1]. For instance, 
in the cases above each is conditioned on A. It will be shown later that it is unnecessary for the necessity 
and possibility operators to have conditional forms. 

Definition 4.1 The sentences or well formed formulae fwffsj of Dp comprise the least set such that 


1. If A £ Lp then UA is a wff. 

2. If A,B £ Lp then A =>, B is a wff for 0 < « < 1. 

3. If D,E £ Dp then -» D and D -* B are wffs. 

Conjunction (A), disjunction (\/), and biconditional (*->) on sentences in Dp, and possibility (o) and likeli- 
hood ) on sentences in Lp are introduced by definition. 

Definition 4.2 The sentences or well formed formulae of QD P consist of the sentences of D P with the 
numeric subscripts dropped from “=>” and The operator may have optional integer superscripts 

weakening the order of likelihood. 


9 



Some examples of QDP sentences were given in the introduction. The four modal operators have operator 
precedence midway between disjunction and conditional /biconditional. So a disjunction binds before a 
default operator, and a default operator binds before a conditional. For instance, the sentence 

AVBAC=»B-*oBAP 

is identical to the sentence 

((A V (B A C)) ■=> D) —* o(E A F) . 

Although, 

o A A B => C 

is identical to the sentence 

(oA) A {B => C) , 

because otherwise the sentence does not parse. 

4.3 Semantics 

In DP , w f=j>r B” denotes that I) € Bp is true for the probability distribution Pr. Pr plays a role not unlike 
an interpretation in standard propositional logic. 

Definition 4.3 Given a probability distribution Pr on Lp, “|=pr” m defined on sentences from Dp as 
follows. 

1. |=p r O A if and only if Pr( A) = 1. 

2. |=p P A=> ( B if and only if Pr(B\A) >l-e. 

S. | —p r -iB if and only if not f=p P B. 

4. |=p r B — » E if and only if not f=p r B or ^=p r E. 

Possibility and likelihood are by definition dual operators for necessity and default respectively, “oi” 
is defined as “-.D-iA”, so j= Pr o A if and only if Pr(A) > 0. “A B” is defined as “--(A =>« , so 

A a$- e B if and only if Pr(B\A) > e. In addition, “f=> e B* is shorthand for “true B , and likewise for 

If the necessity operator were to have a conditional version, it would have the semantics Pr(B\A) = 1, but 
since this is equivalent to Pr(A -♦ B) = 1, a conditional form of necessity can be adequately constructed as 
□ (A — ► B). Likewise, a conditional version of the possibility operator can be constructed as oA — » o(A AB). 
A map translating probabilities to subsequent modal representation is given in Figure 1. By convention, 
is subscripted by greek letters e, 6, etc., which are intended to be small (<C l)i whereas, is 

subscripted by the letters e, /, etc., which are intended to be not as small. This is no absolute restriction; 
it gives an indication of the intent of the sentences. 

Definition 4.4 A sentence D € Bp is a theorem of the probabilistic logic DP if \=p r D for all possible 
probability distributions Pr. This is denoted i = DP D D is a consequence of a set of sentences T if there 
are Bi, . . . , B„ € T such that \=dp (Di A ... A B n ) — » B. B is consistent if ->D is not a theorem of DP. 

To obtain qualitative rules about default and likelihood from the quantitative rules in BP, we can 
perform order of magnitude reasoning. We can consider a representative default error, e, where € might 
be less than 0.01, or whatever the decision context requires. Likewise, we can consider a representative 
default likelihood, e, where e might be greater than 0.05, say. The choice for modelling particular limits 
rather than some arbitrary infinitesimal is motivated by the decision theoretic argument at the beginning 
of Section 3.3. In order to approximate the behaviour of our reasoning with these particular limits in mind, 
we can parameterise the system by e and e and consider only approximate calculations to O(e) and 0(e). 
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A map translating probabilities to these kinds of qualitative values is given in Figure 2. The hashed regions 
represent those fuzzy boundaries where the qualitative reasoning becomes most susceptible to error. 

QDp is defined in a manner such that e and e are arbitrarily small, but e is also arbitrarily smaller 
than e. Of course, it is unrealistic to expect arbitrarily small magnitudes for e and e to be achieved, let 
alone the right relative magnitudes. This, however, is irrelevant, as far as the application of the logic is 
concerned. The “arbitrarily small” magnitudes are only being used as a theoretical device to investigate the 
approximate behaviour of notions like “=>” and “a*-” for e being small and e being not quite as small (see 
also [7, Section 10.2.4]). In addition, the choice of relative magnitude between e and e is a particular design 
decision that might just as well have been made some other way. Applications of QDP should of course take 
this into account. 


Definition 4.5 A sentence D € QDp is a theorem of the qualitative probabilistic logic QPD if there exists 
a theorem D' E Dp corresponding to D (that is, identical except for any super or subscripts), in which all 
subscripts to “=» ” and ” are parameterised by some variables € and e and each subscript to “=> is of 
order e as e approaches 0 and e remains finite, and each subscript in D‘ corresponding to ” in D is of 

order e n as e and f approach 0. This is denoted = QDP D”. Consequence and consistency are defined as 

before. 


From the definition of (for e > e) it follows that 


^ -dp AAp e “*A). 


Consequently, . . 

(=qdp A A *=>• -«A) . 

That is, if something is likely, its negation cannot be true by default. But the complementary sentence 

(F*- A V F* -’A), is not a theorem. _ . 

It follows directly from this definition that the set of theorems of QDP is closed under application of 


modus ponens and conjunction. That is, 


\= DP D and | = DP D -* E implies |=x>p E , 

]=£>/> D and |=pp E if and only if | =z>p DAE. 

Because the definition of QDP is based on an order of magnitude argument, there are potential pitfalls with 
these closure properties. Order of magnitude arguments invariably give dubious results when the constant 
factors become too large. Suppose a lottery has 1,000,000 participants. The following sentence can be 
shown to be a theorem of DP. 


1 , 000,000 

f\ (person i will not win the lottery) — ♦ |=>i,ooo,ooo*« (no-one will win the lottery) . (3) 

i=i 

Moreover, replacing the error bound 1, 000, 000 * e by 999, 999 * e yields a sentence that is not a theorem of 
QDP Without the error bounds, the sentence would seem to read “if, by default, any particular person will 
not win the lottery, then, by default, no-one will win the lottery at alT. The illusory lottery paradox has 
reappeared. In DP this is not the correct reading because with the natural value for e, 1(000(000 , the right 
hand side of rule (3) is impotent (its default error is 1). In QDP unfortunately, it is the correct reading: 
QDP drops the subscripts (both are of order e as e approaches 0) and loses the error information. 

If we wish a purely qualitative default logic to be closed under conjunction and modus ponens , two 
seemingly intuitive properties, then we have no choice but to accept that the above kind of anomaly may 
occur. People get around this with an intuitive knowledge of where plausible reasoning is likely to break down, 
for instance, by not making default or likelihood inference to any great depth: “don’t rest your argument on 
too many assumptions, something is bound to go wrong along the way!”. Default and likelihood reasoning 
may well produce incorrect results when carried on indefinitely; they should, however, be “locally” correct. 
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Imprecision is an inherent property of plausible reasoning; so knowledge of how to contain the imprecision 
is a prerequisite for-safe plausible reasoning. Hence the importance of DP in understanding QDP. 


4.4 Basic theorems 

This section introduces a few basic theorem schemata, and discusses several notable but unrelated properties 
of the logics. Examples of using the logic QDP are given later in Section 5. 

First, the default and likelihood operators can be broken down into two components, according to whether 
the antecedent is possible or impossible. This is done using 

\^qdp 4 => 2? «-» (□-•4 V o4 A 4 =>!?). (4) 

The second component here, oA A (4 => J5), is referred to as the proper default operator, and likewise for the 
likelihood operator. This corresponds to Adams’ notion of the conditional over “proper” distributions [19, 
p49], that is, distributions where the antecedent of the operator must be possible. The unmodified, improper 
version of the default, A => B, corresponds to Adams’ original notion of the conditional [30]. While the 
mathematics of the improper default is generally easier, it is sometimes better to break down the default 
and likelihood operators into the two components, and then put the pieces back at the end. 

Second, both DP and QDP can be seen as natural extensions to propositional logic. For instance, the 
theorems for given later in Table 1 encode the provability relation in propositional logic. The following 
lemma further highlights the connection. 

Lemma 4.1 First, all substitution instances of the theorems and rules of inference of standard propositional 
logic that are sentences of Dp hold for DP. Second, in DP necessary equivalences can be substituted. That 

j = DP D(4 «- B) - (D(A) ~ D(B)) , 

where D(A) denotes any sentence of Dp with an occurrence of the propositional formula A in a particular 
position. Corresponding results for QDP hold. 

Third, some examples of theorem schemata of DP are given in Tables 1-3. These hold for d , e, € and 
6 all less than Certain dual forms, either on or on are given in the third column. These are 

obtained by restructuring the formula and converting either or “=>” to their dual. In each case, either 
the original form or the dual form can be proven by the consistency or consequence theorems presented in 
Section 4.6. For each DP theorem in Tables 1-3, the QD P sentence obtained be removing subscripts (and in 
the case of the duals for theorems T14 and T16, making the operator “ft?- 2 ”) is a theorem of QDP. 

One important aspect of any DP theorem is the relationship between errors on the defaults and likeli- 
hoods. For instance, we can rewrite theorem T17' as 

(4 C) A (£ C) -+ 4 V B C , 
and note that this only holds for some values of /, and in particular holds for 

/ < 1 d ) • 

e + d — ect 

In this case, / represents an error propagation function, which relates the errors in the DP theorem. If we 
were to apply this theorem in some chain of reasoning to deduce 4 V B stb C, then we could either choose 
to forget about the error /, as we implicitly do when using QDP, or we could use the error propagation 
function to compute a value for / from e and d. Bear in mind that an error propagation function only 
represents a worst-case bound on error. If we were to do a more precise probabilistic analysis, we may find 
that error has shrunk to nothing, however, the error propagation function represents an upper-bound on 
what error can be. In Section 4.6 it is shown that for small errors, DP behaves just like QDP, so a system 
for reasoning about defaults and likelihoods can be constructed using the qualitative logic QDP, and then 
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optionally, error tracking facilities can be grafted on top with the use of error propagation functions to give 
approximate probabilistic reasoning. 

Finally, theorems of the logics can be generalised by uniformly changing conditioning information. 

Lemma 4.2 Any theorem of DP (QDP) can be transformed to another by uniformly changing conditioning 
information. Given conditioning information C, a formula D is transformed by uniformly applying the 
following transformations to all non-propositional operators in “o true — * D: 


DA 


oA 
B =>« A 
B A 


□(C — A) 
o(C A A) 
(CAB) =*« A 
(CAB) f*h e A 


Versions of some of the theorems extended using this transformation are given in Table 4. Notice that 
for theorems Til', T12', T13', and T14', the initial term “o C has been dropped: this is safe because 

the and “rjk" operators are always true and false respectively if the conditioning part is necessarily 
equivalent to false. A similar situation holds for theorem T6'. 


4.5 Relevance 

The antecedent of a default or likelihood corresponds to the context in which the rule can be applied. So 
the rule B ■=> C can be applied when we know just B, nothing more or less. This feature is inherited from 
the semantics of the conditioning operator in probability theory. As a result, defaults and likelihoods cannot 
have their antecedents arbitrarily specialised. That is, the QDp sentence 

(B => C) — (A A B => C) 

is not a theorem of QDP ; so the context B cannot in general be specialised to include other information, in 
this case A. 

A second related feature of the logics is that there is no transitive relation applying to defaults or 
likelihoods. The same holds for NP [1, Section 7]. That is, the QDp sentence 

(A => B) A (B => C) — » A => C 

is not a theorem of QDP. For instance, a counterexample to this transitive sentence is that penguins are 
birds, most birds fly, and penguins do not fly. So we would not expect the sentence to be a theorem. However, 
if we are told that the yellow-bellied warbler is a bird, and know nothing else about it, it is quite plausible 
to us that the warbler should fly. 

So for plausible reasoning in certain situations, we would like some form of transitive reasoning. Notice 
the QDp sentence 

(A => B) A (A A B => C) — A => C 

is a theorem of QDP (T12' in fact). Suppose we can obtain some additional information that implies the 
rule B =>• C is the same as A A B ^ C, so the condition A in the antecedent is not relevant. Then this 
additional information together with theorem T12 # shows the original transitivity form above does hold. 

This ability to modify the antecedent of a default or plausible rule requires reasoning about relevance, 
where a condition in the antecedent is irrelevant if it can be added or deleted and still maintain the correctness 
of the rule. In probability theory, such information can be obtained in a number of ways. We can represent 
this information using the notion of independence, and in a more limited sense, following Neufeld et al. [20], 
the notion of favouring. 
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Definition 4.0 Proposition A is independent of proposition B given proposition C if 

Pr(B\C) = Pr{B\C A A) . 

Proposition A favours proposition B given proposition C if 

Pr(B\C)<Pr{B\CAA) . 

Lemma 4.3 If proposition A is independent of proposition B given proposition C then the following sen- 
tences of QDp are true: 

C B * — ► (CAA)=>B , 

C A < — ► (C A B) => A , 

C A < — ► (C A B) A , 

C B * — ► (CaA)sj-B . 

If proposition A favours proposition B given proposition C then the sentences above only hold for the forward 
direction, that is, replacing by 

It should be clear from this lemma that methods for reasoning about relevance are vital in plausible 
reasoning in order to modify plausible rules so they can be applied to each particular context. Some examples 
axe given in Section 5. Caused (or Bayesian) networks can be used for this form of reasoning, and the 
maximum entropy method provides a way of making independence assumptions “by default” [7]. 


4.6 Consistency and consequence 

The question of whether a sentence from Dp is consistent can be converted to the question of whether one of 
a set of simplex problems in the 2” variables {Pr(p)|p € Ep } has a solution. Consequently, DP is decidable 
(this is similar to Probabilistic Logic [32]). For the purposes of this paper, it is not worth obtaining axiom 
schemata and rules of inference for the whole of DP, since we are really only interested in the case where 
the errors are quite small. A system encompassing the whole of DP would most likely degenerate to the 
kind found in [33, plO], where the schemata is close to an enumeration of primitive operations in the simplex 
algorithm. Fortunately, a different approach is available. Adams [30,19] has developed tests for consistency 
and entailment in his conditional logic, which have been extended by Goldszmidt and Pearl [34]. Similar 
consistency and consequence tests are presented below for the default and likelihood components of DP, 
and are easily adapted to QDP. These results show that reasoning can be performed using the qualitative 
system QDP , and the approximate error bounds of DP propagated concurrently. In particular, default 
errors propagate additively. 

Tests on consistency and consequence are presented below in terms of a clausal form. Consider the default 
component of QDP. An arbitrary sentence containing the default, possibility and necessity operators can 
be turned into a conjunction of clauses, where each clause has the form 

DU A i£i v o Vi A Ai => Bi — ♦ Vigj c G,- => Hi , 

for some index sets J v , Ja, and I c . Notice that all necessity and possibility operators have been gathered 
in the antecedent of the clause, by converting -*UA to o-^A where necessary, and all the necessity operators 
have been combined into one using theorem T3. 

It is also of interest, though not essential for the development of this section, to consider a more precise 
interpretation of what it means for a clause to be a theorem in QDP . Lemma 4.4 uses the above clausal 
form to reinterpret the definition of a QDP theorem. 
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Lemma 4.4 


t= Qdp A,-g/ v o Vi A iei A Ai =>• B{ — ► Vig/ c Gi => -ST,- , 

:/ and only if there exists a 6 and rj such that for all e < rj 

Aig/y. oVi Ai£i A A% ^ t Bi 1 Vtgjj-.Gt •B r » • 

For the Dp sentence in the lemma t 6 is an error propagation factor , and 6e is an error propagation function , 
which in this case is linear. The larger the value of 6 , the faster error can propagate when this particular 
clause is applied in some chain of reasoning. By comparison, Adams’ notion of entailment corresponds to: 

if and only if for all e there exists a 6 such that 

)fdp Aigjv o Vi A iei A Ai =>$ Bi — ♦ Viei c Gi =>« Hi • 

The difference between the two notions is that in QDP error is restricted to propagate linearly. 

Likewise, we can convert sentences containing the likelihood, necessity and default operators in a clausal 
form. The corresponding notion of a QDP theorem is given in Lemma 4.5. 

Lemma 4.5 


I =qdp Aig/ V oVi Aia A Ai z>- ni Bi — ► V.-gj^G,- Hi , 

if and only if there exists a S and rj such that for all € < rj 

\ =dp A<gj v oVi A,g/^ Ai Bi — ► V,gj c G,- 5 Hi . 

In this case, the error propagation functions, 6e m< , are polynomial, and S is the error propagation factor. 
Since a smaller likelihood represents more room for error, the smaller the value of 6, the faster error will 
propagate when this particular clause is applied in some chain of reasoning. 

Results below on consistency and consequence of clauses rising the default operator are extensions of 
several theorems in [30,19], and similar extensions can be found in [34], although Adams’ terminology is not 
used here. The extensions introduce necessity and possibility. Consistency turns out to be the operation on 
which the three kinds of consequence tests are based. 

Logical tests for consistency and consequence are given in Theorem 4.6 for clauses containing the default 
operator. These axe given for DP and, because the error propagation functions are linear, can be extended 
to QDP simply by dropping the error subscripts. 

Theorem 4.6 Consider the Dp sentence D given by 

Off Aig/ V oVi Ajg i A Ai Bi , 

where tj < ^ for i € I A - Let I max denote the ( possibly empty) maximum subset of I A , I, such that 

U A (Vjg/A,) A,-gj (Ai — ► Bi) is unsatisfiable. Such a maximum set exists and is unique. 

1. The sentence D is inconsistent if and only if there exists some j 6 Iv such that U A Vj Aigj^,., ->A,- is 
unsatisfiable. 

2. The Dp sentence C H is a consequence of D for some 6 < \ if and only if D A (C ->B) is 
inconsistent. This holds if and only if D itself is inconsistent or □-» C is a consequence of D or there 
exists some J C I A such that 

U A ( C Viqj A,-) A,-gj (Ai — » Bi) A (C — * ' n -E) 
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m unsatisfiable. If C =>* B is a consequence for some 6 and can be demonstrated so using J , then 
6 = e» m o correct error propagation function. 

3. The Dp sentence oC is a consequence of D if and only if D A □ ->C is inconsistent. 

4. The Dp sentence OC is a consequence of D if and only if D itself is inconsistent or 

h u hieu.. "'Ai c • 

When treating sets of sentences containing only proper defaults, the consistency test of part 1 has a special 
case. The set is consistent only if J max 1® empty. If the set is inconsistent, then Imax represents the maximum 
set of proper defaults that could be considered the cause of the inconsistency. In this proper case, a necessity 
ran only be a consequence of a consistent set by standard logical deduction from other necessities. In general 
however, with some improper defaults, whether a necessity is a consequence may depend on all elements of 
the set including the defaults. Theorem T7' is an example. 

The resultant algorithm for checking consistency of QDP sentences is given in Figure 3. 

Corollary 4.6.1 The defaults- consistency algorithm is correct and uses at most |IV| + \I A \ 2 /2 satisfiability 
tests on the underlying propositional logic. 

The third step of this algorithm also forms the basis of testing the consistency of sentences containing only 
the necessity and possibility operator. Any such sentence can be converted to a conjunctive normal form 
consisting of a disjunction of conjuncts of the form OU A,gj v oVJ. Each conjunct can be tested for consistency 
using the first step. 

Corollary 4.6.2 Let the QDP sentence D containing only the necessity and possibility operators be in 
conjunctive normal form , and let |Dj denote the number of modal operators in the sentence. Then the 
consistency of D can be determined using less than |D| satisfiability tests on the underlying propositional 
logic. 

The drawback with this result, however, is that the size of the conjunctive normal form of a sentence can be 
exponential in the size of the original sentence. 

Tests for consistency and consequence using the likelihood operator are given in Theorem 4.7. 

Theorem 4.7 Consider the Dp sentence D given by 

nU A igj v oVi A izi A Ai £$-ti B, , 

where e, < for i € I a- Lei Imin denote the least subset of I A , I, such that U A , e j A< A Aj A Bj is 
satisfiable for all j € I a ~ L. Such a minimum set is unique. 

1. The sentence D is inconsistent if and only if there exists some j € Iv such that U A, € j miB -*A,- A Vj ts 
unsatisfiable. 

2. The Dp sentence C B is a consequence of D for some f < 1 if and only if D is inconsistent or 
there exists an ordered subset of the indices in I A — Imin> *i» *2» ■ • *fci possibly empty (h — 0^, such 
that for j — 1, . . ., h, 

N U A i^i min ~'A i A Aij A Bij Ak<j ~'Ai h —* (C A B) , and (5) 

^ U A,- € j mi „ ->A,- A u<h “’A,* — ♦ (C -* B) . (6) 

If consequence holds, then a lower bound on f, the error propagation function, is given by 

f > I ] , where e = min ei k , 

7 - Vi + ey l ^ h 
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although, the error propagation function can be linear in the e< in some cases. 

8. The Dp sentence oC is a consequence of D if and only if D A D->C is inconsistent. 

4- The Dp sentence □ C is a consequence of D if and only if D is inconsistent or 

1 = & ^ A i C • 

There is also a special case of this theorem that applies to non-iterated versions of the likelihood operator. 
Corollary 4.7.1 Consider the QDp sentence D given by 

OU Ai € jv oVi A iel A A * -®i • 

The QDp sentence C ft>- B is a consequence of D if D is inconsistent, or there exists some I C 1a — 7 TO »n 
such that for j € /, 

N U A ie z mtn -iAi A Aj A Bj “+(CaB) , and (7) 

N ^ Ajg/ -ui,- -» C -*■ B . (8) 

I conjecture that the converse of this theorem also holds. The dual form of the corollary, converted to apply 
to defaults, allows a disjunction of defaults to be the consequence of a single default. An example of this 
corollary is theorem T17 and its dual. 

An algorithm for checking consistency of QDP sentences is given in Figure 4. Step 2(b) has been added 
to this to mak e the algorithm more efficient when some of the likelihood operators are proper. 

Corollary 4.7.2 The likelihood-consistency algorithm is correct and uses at most |Iv| + |7a| 2 /2 satisfiability 
tests on the underlying propositional logic. 

An algorithm for checking consequence is given in Figure 5. This algorithm assumes the consistency check 
has already been made? The error propagation function in this case can be taken from Theorem 4.7 part 2, 
and a tighter error propagation functions given in the proof'of that theor|p. 

Corollary 4.7.3 The likelihood-consequence algorithm is correct and uses at most (|/.a| + 1) 2 /2 satisfiability 
tests on the underlying propositional logic. 


5 Applications 

This sections demonstrates the use of the qualitative logic QDP on three anecdotal problems that reoccur 
in the default reasoning literature. 

The first example resolves the paradox of the “vanishing subclasses” . The second example demonstrates 
how reasoning about independence using causal networks can be integrated with the forms of plausible 
reasoning just developed. The final example is the classic Yale shooting problem [8]. This example highlights 
a subtle problem with the situation calculus when it is used for plausible reasoning. 

5.1 The “vanishing” emus 

Neufeld et al. have criticised the modelling of default reasoning based on infinitesimal probabilities [20, P 123] 
on the grounds that it makes “subclasses vanish”. Consider the following rules: 

Emu — » Bird , 

Emu => -'Flies , 

Bird => Flies . 
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(9) 

( 10 ) 

( 11 ) 



The following are consequences. 


Bird => —‘Emu 
=» -‘Emu 

We can conclude that “typically, birds aren’t emus” and “typically, things aren’t emus”. To show the first 
is a consequence using Theorem 4.6, notice U — (Emu —* Bird ), Iv = 0, and try to show the rules together 
with Bird => Emu is inconsistent. This follows because the rules themselves are consistent and 

U A (Emu V Bird) A (Emu —* -*Flies) A (Bird — + Flies ) A (Bird —* Emu) 


is unsatisfiable. 

If we take the 0(e) semantics of the default operator literally then we could conclude, since e is infinites- 
imal, that “no birds are emus”, or “nothing is an emu”. The real intent of the semantics, however, is about 
approximations for e small. So instead we should conclude that the emu is just an uncommon or non-typical 
bird, which in reality is true of emus. The approximate probabilistic semantics does not cause subclasses to 
vanish; but it may cause you to deduce some subclasses must be non- typical. 


5.2 Can Joe read and write 

The importance of independence in default reasoning, and plausible reasoning generally, has been underlined 
by Pearl in his simple problem “can Joe read and write?” [7, Set. 10.3]. This is a good example of why 
general transitivity should not hold for default reasoning. A twist is also given at the end to show how 
likelihood reasoning can complement default reasoning. 

Pearl introduces the propositions (I have altered the symbols) 


Over-7 

RdWr 

EngPrf 

Shakes 


Joe is over 7 years old , 

Joe can read and write , 

Joe’s father is a Professor of English , 

Joe can recite passages from Shakespeare . 


and the default rules (expressed in QDP) 


RdWr => Over - 7 , 
EngPrf => RdWr , 
Shakes =► RdWr . 


(12) 


Let Auteraey denote rule set (12). Pearl also assumes that Joe is over 6 years old and is not retarded, so 
that the default rules above seem reasonable. 

Given, in addition, that Joe recites Shakespeare, Pearl argues that a reasonable conclusion is that Joe is 
over seven years old. That is, we want to be able to infer the default rule 


Shakes => Over-7 . (13) 

On the other hand, given that Joe‘s father is a Professor of English, it is not a reasonable conclusion that Joe 
is over seven years old. An argument being that Joe’s father’s profession adequately explains Joe’s literacy, 
so we don’t need the more common explanation that Joe is over seven years old. We do not want to be able 
to infer the default rule 

EngPrf => Over-7 . (14) 

The problem with the formulation at present is that the constraints on Shakes and EngPrf Me syntac- 
tically identical, but we hope to infer conflicting default rules for them. In QDP (and in NP) it happens 
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that neither default rule (13) nor (14) can be derived. We do get, however, that 

A Hteracf \=qdp EngPrf => Over- 7 < — ♦ ( EngPrf A RdWr) => Over-7 , 

A literacy |= qdp Shakes => Over- 7 < — ► (Shakes A JZdWr) => Over-7 . (15) 

The problem as it stands is underconstrained. So, what information is missing? 

Pearl’s solution to the problem introduces the notion of causality. For instance, Joe’s literacy is a partial 
cause (and the only direct one occurring in the formulation) of Joe being able to recite Shakespeare. What 
Pearl alludes to but never explicitly mentions is the causal network (a Directed-Acyclic Graph (DAG) [35]) 
given in Figure 6. In this network, arcs correspond to the intuitive notion of “can cause”. 

As Pearl and Verma explain, such a causal network provides information about independence [35, defi- 
nition for DAGD, p376]. It should be pointed out that the notion of causality is merely incidental to their 
analysis: it serves as a useful, intuitive focus for acquiring knowledge about independence. We can subse- 
quently apply the dependence information so obtained to default and likelihood reasoning using Lemma 4.3. 

Applying Pearl and Verma’s technique of deducing independence relations to Figure 6, we get that Joe’s 
Shakespearean recital is independent of Joe being over seven, given he is literate. In QDP , it follows that 

RdWr => Over-7 ♦ — ► ( Shakes A RdWr ) => Over-7 . 

Let us denote by the dependence information obtainable from Figure 6. Together with the default 
conclusion (15), we get 

^literacy U Ti Shakes =► Over-7 . 

The same does not hold for EngPrf , however, because in contrast we get that Joe’s father’s profession 
is not independent of Joe being over seven, given Joe is literate. Because of this, the truth or falsehood of 
default rule (14) is undetermined from ^literacy and IV But, if we were also told that it is likely for a child 
of a Professor of English to be under seven years old and literate, then default rule (14) becomes false as 
required. That is 4 . 

A literacy U { EngPrf ftjv- (RdWr A -iOver-7)} \=qdp “'(EngPrf => Over-7 ) . 


5.3 The Yale shooting problem 

A second problem that needs to incorporate independence for a solution is the Yale shooting problem [8]. 
This problem has been the subject of considerable discussion in AI, and it is beyond the scope of this paper 
to give a reasonable survey. In this section, the specific solution of Delgrande [36, Section 6.2] is considered. 
In probabilistic reasoning it is important to differentiate between what is currently known, and what is 
not. However, the situation calculus, in which the Yale shooting problem is usually presented, allows the 
representation of knowledge about static properties of a state but represses the representation of knowledge 
about events. This causes problems in the subsequent representation of defaults, which we discuss below. 

The Yale shooting problem cm be presented briefly as follows: a gun is loaded; one waits for a moment; 
a shot is fired. We should conclude by default that the person is dead, assuming, of course, the gun was well 
aimed at the person, etc. Early default reasoning systems could not make this conclusion; during the wait, 
the gun would not stay loaded by default. 

Delgrande [36, Section 6.2] initially suggested a situation calculus representation of this problem in NP 
that in QDP becomes: 


□T(A/ive,So) , 


4 Derive this result as follows. From EngPrf a>- ( RdWr A -^Over-7), the conditioned version of theorem T10, and the 
conditioned version of the theorem given in Equation (2), infer -»( EngPrf A RdWr => Over-7). Finally, combine this with 
default rule (15). 
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DT(Loaded, Result(Load, s)) , 

T(Loaded , s) =» T(Dead , Result(Shoot, a)) , 

T(/, a) =>• T(/, Result(e , a)) . 

Variables are given by e, / and a, and state So is some constant stenting state. The first sentence reads 
u Alive is necessarily true in state So ” , the third “if Loaded is true in some state a then typically Dead will be 
true in the state resulting from a Shoot in state a”, etc. Assume Result(S n ) is denoted S n +\. To adequately 
handle the shooting problem we now wish to infer that contingent on a certain sequence of events taking 
place, a death will occur. 


T(Load , So) A T(Wait , Si) A T(Shoot , S 2 ) => T{Dead , S 3 ) . 

As Delgrande points out, this formulation cannot be correct. From the second sentence and theorem T6' 
we get 

~'T(Loaded, a) => T(Loaded, Result(Load, a)) , 
and together with an instance of the fourth sentence (/ = -» Unloaded ), 


-i T(Loaded,s) => <T(Loaded, Result(Load, s)) , 


from theorem T7' we get 

UT(Loaded,s) . 

That is, the gun is always loaded! If we added an Unload event to the above formulation that resulted in 
the gun being unloaded, we could similarly deduce that the gun is always unloaded! 

Delgrande suggests repairing this conflicting state of affairs by changing the last sentence to (assuming 
that equality is introduced) 


(/ = Alive) V (T(/, a) =}> T(/, Result(e, a))) , 

(e = Shoot) V (T( Alive, a) =» T(e, Result(Shoot, a))) , 

which together say people do not tend to remain alive if they are shot, or changing the second last to 

(T( Alive, a) A T(Loaded , a)) => T(Dead, Result(Shoot , a)) . 

In addition, we will have to take this kind of evasive action for every event type. Adopting the first strategy, 
the simple concept “things tend to stay the same” is starting to look decidedly lengthy. We are required to 
explicitly detail all those exceptions default reasoning is supposed to circumvent. The second strategy seems 
to introduce an unnecessary complication: if you shoot a dead person they will remain dead, so why bother 
specifying they should be alive before the shooting. 

The real problem lies with the representation of knowledge about events. Without knowing which event 
occurs at a state, we know things will tend to stay the same. Once we know which particular event occurs, 
however, we also know for sure that certain things will change. The antecedents in the conditionals in 
Delgrande ’s formulation need to be qualified with knowledge about events to block the conflict between 
the second and fourth sentences. We do this by modifying the sentences to allow explicit representation of 
knowledge about events: 


□T( Alive, So) , 

B(T(Load , s) — ♦ T (Loaded, Next(s))) , 

T(Loaded , s) A T(Shoot, s) => T(Dead, Next(s)) , 
T(f,s) => T(f,Next(s)) , 
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where T(e, s) about an event e such as Shoot denotes that it is known that the event e occurred in situation 
s, and Nezt(s) denotes the state after state s. Denote this set of sentences by A,hooting . 

But with the problem as formulated in A,hooting ♦ tbe required result is not forthcoming. Again we need 
information about relevance to show how the redrafted sentences can have their antecedents sufficiently 
specialised. 

First, the following can be inferred from the third rule in A ,hooting given that if a loaded gun is shot at 
someone, then events strictly prior to the shooting are independent of possible death, 

T(Zoad, So) A T(Wait t Si) A T(Shoot , S 2 ) A T(Loaded , Si) A T(Loaded , S 2 ) => T{Dead , S 3 ) . 

Second, the following can be inferred from the fourth rule in A, hooting given that whether a gun stays loaded 
is only dependent on prior Unload or Shoot events. 

T(Load , So) A T{Wait, Si) A T(Shoot , S 2 ) A T(Loaded , Si) => T(Loaded, S 2 ) . 

This information about independence, call it T 2 , is sufficient to yield the required result. 

Aihooting U T 2 \=qdp T(Load , So) A T(Wait , Si) A T(Shoot , S 2 ) => T(Deod, S 3 ) . 

Notice that T 2 could have been obtained automatically using the default independent assumptions of maxi- 
mum entropy. 

The specification of T 2 can be seen to involve as much detail as Delgrande’s earlier suggestion. So where 
is the advantage? The defaults remain in a simple form, and the exceptions are instead coded in the modular 
form of causal (independence) information about events. 


6 Further Comparisons 

This section compares the logics DP and QDP with some related approaches. Halpern and Rabin’s and 
Halpern and McAllester’s likelihood logics, and Neufeld et al. influence graphs are compared because they 
have also been motivated by probability. Comparisons with Adams’ conditional logic have been sprinkled 
throughout Section 4, and are not reiterated here. The last comparison given here is with Delgrande’s NP; 
this system had an historical influence on the logics DP and QDP. 

6.1 Halpern and Rabin’s likelihood logic 

Halpern and Rabin propose the unary likelihood operator L with semantics [14, p386] 


Lp is best thought of as saying “p is reasonably likely to be a consistent hypothesis.” 

This should not be confused with “p is reasonably likely”, the interpretation Halpern and McAllester give 
to Lp [9, p5]. 

For instance, suppose a lottery with 1,000,000 tickets is being held, then the following can be deduced 
by applying their Axiom AX6 repeatedly: 

1 , 000,000 

^(someone will win the lottery) < — + \J £(person i will win the lottery) . (16) 

i=l 

The right hand side of this equivalence reads, there exists a particular person who is likely to win the lottery. 
In the Oxford dictionary sense of the word “likely”, this is certainly not true before the lottery is held. So in 
the Halpern-McAllester interpretation, the sentence (16) above can be interpreted as true <-*• false. This is 
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a variant of the lottery “paradox”. Because they assume that likelihood reasoning is precise, they conclude 
that the Halpem-Rabin interpretation must be more appropriate. 

By contrast, in the framework proposed here it is taken for granted that likelihood reasoning may be 
imprecise. As explained after Definition 4.5, QDP suffers from the lottery “paradox in a sense, but it is 
viewed as an anomaly, an inherent consequence of modelling imprecise reasoning with a precise logic. Of 
course, such anomalies can be avoided by either using heuristics about plausible reasoning (“don’t make too 
many assumptions”), or by resorting to numeric methods which allow more careful tallying of degrees of 
imprecision. 

Notice that interpreting Lp to mean “p is a consistent hypothesis” yields the following transformation to 
QPD : Lp »— ► op, and Gp *— * Dp. Indeed, their axioms on non-iterated modalities each have a corresponding 
theorem in QPD. 

Halpern and Rabin propose instead that iterated modalities of the form L l Gp be used to model “p is 
reasonably likely”, and they give soundness results to support their claim. There is, however, a serious 
methodological problem with this approach: knowledge expressed in the form they propose is non-modular 
and cumbersome. A sentence such as “Pi is reasonably likely given P 3 ” is represented in QDP simply as 
P 3 Pi. In their logic it translates to 

—>G—'Pi A -'GP 2 A -^GPa A GP 3 => LGP\ 


in one situation, and 

~iGP\ A *iG"’>p 2 A ~*GP± A GP 3 => LGP\ 

in another. These cumbersome translations occur because, as they explain [9, p7], their representation has 
no means of making likelihood contingent on what is currently known (for instance, by using conditioning, 
the role played by the left hand side of the “s$-” operator). Worse still, if the model (and consequently 
the atomic propositions used) becomes extended, the appropriate translation must be extended as well. In 
addition, Halpern and Rabin give no evidence that non-trivial theorems hold about iterated modalities of 
the form L'G. 

6.2 Neufeld and Poole’s favouring formalism 

Neufeld et al. present influence graphs , a qualitative system for reasoning about favouring [20) that is related 
to Suppes’ causal algebra [37]. B favours A when Pr(A\B) > Pr(A). It was argued in Section 4.5 that 
favouring provides an important complement to the logics presented here. 

Favouring alone, however, is not sufficient information on which to base a decision. This stems from the 
fact that favouring is for reasoning about shift in belief and not current strength in belief. For instance, it 
is well known now that smoking favours cancer (that is, a smoker is more likely to have cancer than a non- 
smoker). But the knowledge that a person smokes is not sufficient evidence on which to base a conclusion 
that the person has cancer. It merely provides an additional degree of support for such a conclusion. 

6.3 Delgrande’s conditional logic NP 

There is a strong correspondence between the theorems of QDP and Delgrande s N P [1]. The only axiom 
of NP that is not also a theorem of QDP is the CV axiom given by 

--(A => B) — * (A => C) — ♦ (A A -'B) => C , 

although this is similar to the QDP theorem T14'. Notice, however, that, by adapting T14' we get that 

^dp “*(A => t 6/2 B) {A=> e C) — > (A A 1 ]?) =>« C . 

This version of the CV axiom does not also become a theorem in QDP because the first default has an error 
that is a different order of magnitude to the second two defaults (e6 compared with c and 5). 
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Also, necessity is introduced into QDP and NP in a very different manner. Nevertheless, theorems 
involving necessity in NP given in [1] are also theorems for QDP. Consequently, almost every theorem of 
NP that is a sentence of QDp is also a theorem of QDP. 


7 Conclusion 

This paper has examined the problem of reasoning about defaults and likelihood from a probabilistic perspec- 
tive. The presentation has been one of theoretical analysis, comparison with existing systems, and review 
of anecdotal examples. The approach developed has extended some existing systems [19,7,1] and put some 
others in a clearer perspective [14]. This highlighted the approximate nature of the reasoning forms, the 
duality between them, and the need for complementary reasoning about relevance and error propagation. 
Algorithms have also been presented for determining some types of consistency and consequence for both 
logics, qualitative and quantitative. 

The following research issues give some idea of how this area might.be further developed. 

Causality, independence [7] and favouring [20] play a complementary but vital role to default and likeli- 
hood reasoning. They help in the determination of relevance, for the derivation of plausible rules applicable to 
a system’s current context. Suppose we have separate information about relevance and defaults. How might 
reasoning about both these forms be integrated? For instance, how can the consistency and consequence 
algorithms be interfaced with algorithms for reasoning about independence? 

There is a remarkable similarity between Delgrande’s conditional logic N P and the probabilistic logics 
presented here. With the necessity and possibility operators, the logics presented here have an ability to 
express sentences roughly in the realm of autoepistemic or default logics. What are the relationships to these 
other approaches? 

How should the effect of the decision context be integrated? For instance, one would like to be able to 
obtain the default reasoning structure present in the layered control systems of Brooksian robots [38], where 
each layer is intended to handle a different class of decision problems. How might these layered systems be 
developed? 

Given that defaults and lik elihoods have been represented here as probabilistic rules, how might they 
be learned from data? Machine learning techniques for rule induction have been developed, but these only 
allowing one particular propositional symbol (or concept) in the consequence of the rule. Some methods are 
described in [39,40,25]. To learn a set of defaults and likelihoods, more general approaches are required that 
simultaneously learn rules with a variety of different propositional symbols in the consequence, as found in 

[41]. . f 

At what point does qualitative reasoning of QDP have to be augmented with quantitative reasoning of 
DP to produce reliable results? Furthermore, when do the approximations inherent in DP break down so 
that a system needs to be developed using more thorough probabilistic reasoning? It may be necessary to 
reason about uncertainty using approximate numeric techniques, and to use the plausible logics developed 
here merely at the man-machine interface. For instance, one observable use of default and likelihood reasoning 
in people is explanation and presentation of results. 

Implementation and application to real problems is clearly one important way to explore these plausible 
reasoning forms further. 
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Appendix Proofs of Lemmas and Theorems 

Proof of Theorem 4.1 First, substitution instances of propositional logic hold for DP because the 
interpretation of theorems for DP is given in terms of propositional logic (“not”, “and”, etc.). Suppose 
E eQDp is some substitution instance of a theorem of propositional logic. Consider E' G Dp obtained by 
transforming => to =>* and s$- n to for some e and e. E' is also a substitution instance of propositional 
logic, so it is a theorem of DP. As this holds for arbitrary e and e, E is a theorem of QDP. 

Second, equivalences be substituted. Suppose Pr(A «-* B) — 1, then 

Pr(C(A)) = Pr(C(A) A (A ~ B)) = Pr(C(B) A (A ~ B)) = Pr(C(B)), 

and the result follows from the definition of a theorem in DP . A similar proof applies for QDP. E 

Proof of Lemma 4.2 Any sentence that holds for an arbitrary probability distribution must hold for 
an arbitrary probability distribution conditioned on some C given that Pr(C ) > 0. That is, given that 
Pr{C ) > 0, we can make the transformations 

Pr{A) ~ Pr(A|C) 

Pr(A\B) ~ Pr(A\B A C) 

and the sentence must still hold for any arbitrary probability distribution. This corresponds to the trans- 
formations given in the lemma. Notice there is no confusion in applying the transformations because the 
operators do not nest. □ 

Proof of Lemma 4.4 First, notice that the order of magnitude definition of Definition 4.5 applies if and 
only if there exists constants c, for * € I a and d. for i € lc and rj such that for all e < 17 and probability 
distributions Pr, 

DU A »€/v Ai — Bi * ^dj* • \ ) 

To show the only if part of the theorem, assume the 6,c condition in the lemma holds. Then let a = 1 
for t G I a and d, = <5 for i E lc, so by above, the clause is a theorem of QDP. 

To show the if part of the theorem, assume the clause is a theorem of QDP so constants c, for » G I a 
and di for t G Ic and V exist as above. Let a = minj € j A c,-, and S — max i€ i c di/a, and rf = Jfa. Pick any 
< xf and note e = e'/a < 77 . We now have that 1 - c,e < 1 - e 7 , and 1 - d,e > 1 - 6S. So if a probability 
distribution Pr satisfies the clause (17) using i] and £, then it also satisfies the D P clause in the theorem 
using T}' and e'. So this clause is satisfied for every distribution. □ 

Proof of Lemma 4.5 The proof proceeds as for Lemma 4.4 but using 

|= Pr DU A i € j v oVi A , e / A Ai — ♦ V, e / C G, Hi , 

instead of formula 17. There is a difference in showing the if part of the theorem; 6 is now constructed m an 
inverse manner. Let 

di 

a = maxi£j c y/ci and S — mirii^i c , 

and proceed as before, noticing that we are dealing with quantities such as £(e') n< rather than l -SS. O 

Proof of Theorem 4.6 part 1 First, we shall prove I ma x exists and is unique. Construct I maT as follows. 
Let I = Ia- K there exists ajGl such that U A Aj A < € j (Ai - Pi) is satisfiable, then this also holds for any 
other I containing so j cannot be in any / m «*, irrespective of uniqueness. So remove j from I and notice 
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if any I ma x exists, irrespective of uniqueness, it must still be a subset of this new I. Repeat this process 
until I cannot be further decreased in size. So now U A (V.-g/Aj) A ,gj (A,- — * B,*) is unsatisfiable, or I — 0. 
By construction, any j G Ia — I cannot be in any I m ax, so letting I majc = I we have that maximality and 
uniqueness hold. 

Second, assume D is consistent and we shall prove the unsatisfiability condition of part 1 fails. Notice 
that if E — *• F then Pr(F) > Pr(E) for any probability distribution Pr. So from the definition of 
for any probability distribution such that Pr(U) = 1, Pr(V,g/ w „Ai A -»B,-) > Pr(V,gj m#B A,). Therefore, 


wxieU.. P r (Ai A ->B,-) > 


\Imax t 


Pr(V, € / m ..A,- A-iB,) 
\Imax\ 


Pr(Aj) 

\Imax\ 


for any j. Therefore either Pr(Aj) = 0 for all j G or for at least one j G I a, Pr(-»Bj|Aj) > j ™ — -y > 

If Pr demonstrates D is consistent, then the second option must fall, so Pr(Aj) = 0 for all j G Imax • Since 
Pr(Vj ) > 0 for j G Iv, then it must follow that Pr ( U A Vj A "’A,- ) > 0, and the unsatisfiability condition 
must fail. 

Third, assume the unsatisfiability condition fails and we shall prove D is consistent. By the definition of 
Imax there exists an ordering of I A — Imax given by »i, . . . , im, such that U A -'A,- A A ti A (A,* — ♦ Bi k ) 
is satisfiable for j = 1, . . . , m. Let truth assignment tj demonstrate this satisfiability for j = 1, . . . m. Also let 
truth assignment tj demonstrate the satisfiability of U A Vj A,€ 7 m «. “•A,* for j G Iv- These second assignments 
exist because the satisfiability condition fails. Now define the probability distribution Pr as 


Pr(C) = Y, (1 -«.)<»(<?) n £i ' + 

ksl,...,m i=l,...,Jk — 1 


Il/=l T ...,ni C| i 

\Iv\ 


E - 

j£lv 


where the truth assignment t(C ) takes the value 1 if C is satisfied by t, and 0 otherwise. By construction, 
this is a well-defined probability distribution that satisfies all the right inequalities for arbitrary e, <1. □ 


Proof of Theorem 4.6 part 2 To show the only if part of the theorem, assume D A (C ~'P) is 
consistent. If C ~ | B with 6 < ^ then clearly ~<(C =>< B), so D A *■*( C =>6 B) is consistent, so it must be 
false that \=dp D (C B). 

To show the if part of the theorem, assume D A (C ~»B) is inconsistent. By formula 4, to show 
C =>s B is a consequence of D it is sufficient to show it is a consequence of D A oC. If D is inconsistent, 
then consequence follows by default. If D A oC is inconsistent then consequence follows as well. So assume 
DAoC is consistent, by part 1 of the theorem and the original inconsistency assumption, it must follow that 

(= -i(I7 A (VigyA; V C) Ai € 7 (A, — ► B,) A (C — » “’B)) , 

for some J C I A . From the second half of the proof for Theorem 4.6 part 1, this follows for any 6 < 1, not 
just 6 < jAj. Noting that (V.g/A,- V C) is equivalent to (V, € /A; V C A i€ j -<A,-) and taking this disjunction 
out through the negation, it follows that 

|= U A (V.gjA,) Ajgj (A,- — + B») — *CAB, 

[= U ACAiB — ► Vjg/Ai A ->Bi . 

Notice that if E — ► F then Pr(F) > Pr(E) for any probability distribution Pr. So for any distribution 
Pr such that Pr{XJ) = 1, 

Pr{CAB) > Pr((ViejAi) A, € j (A, - B;)) = Pr(V< € jA,) - Pr(V* c jA< A iB<) , 
and Pr(V <€ jA, A -»B,-) > Pr(C A ->B). Let Pr be any probability distribution satisfying OU A, e / V o Vi A ,g/ A 
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(A, => ti Bi). So Pr(Aj A -> Bj ) < ejPr(Aj) < e > Pr(V ie for any j even if Pr(Aj) = 0. Consequently, 

Pr(Vi£jAi A -<Bi) < ^Pr(A,- A~»Bi) < ( ^e f - J Pr(V, € M,) . 

teJ \»e/ / 

Let 6 = (£j € /€»)• So 

Pr(C A 2?) > (i-l)Pr(V l€ ^,A-.Pi) > i^Pr(CA-P), 
which is the required inequality to show C =>s B. □ 

Proof of Theorem 4.6 parts 3 and 4 The sentence oC is a consequence of D if and only if D A -> o C 
is inconsistent, by definition of consequence and inconsistency. Replacing -< o C by H-,C shows part 3 holds. 
Similarly, part 4 holds but this time we can simplify the inconsistency of D A o—'C by using part 1 of the 
theorem. □ 

Proof of Corollary 4.6.1 Step 2 constructs I ma * as described in the proof to part 1 of the theorem. 
Step 3 then checks the unsatisfiability condition of part 1. □ 

Proof of Theorem 4.7 part 1 First prove I min exists and has a unique minimum. Notice I A is an upper 
bound on I TOin , so some (but not necessarily unique) I m in exists. Suppose a set V exists which is a subset 
of every possible I m *n , and that U A iqi> “’A,- A Aj A Bj is unsatisfiable for some j £ I a ~~ I • Then this 
unsatisfiability will also hold for any Imin, so j must also be in Imin- So we can place j in P too. If we start 
with P = 0 and iterate this operation to a fixed point, we clearly obtain the unique I ' — Imin because an 
invariant of the operation is “any I m i n must be a superset of I ,rt . 

Suppose the unsatisfiability condition fails, that is, for each j £ Iv that U A»g i mim -’A,- A Vj is satisfiable. 
So there exist truth assignments demonstrating the satisfiable of these. There also exist truth assignments 
satisfying U A , € j m4m -•A, A Aj A Bj for j £ I A - J min , by the definition of I min . Take a probability distribution 
that makes each assignment in the first set infinitesimally small, each assignment in the second set equiprob- 
able, and any other truth assignments probability zero. So Pr(Vj ) > 0, and for j € I a ~ Imin, ■Pr(Bj|Aj) is 
greater than or arbitrarily close to > e ^ c - This distribution demonstrates D is consistent. 

Suppose D is consistent. Consider any* probability distribution Pr that demonstrates this. Let I = 
{i £ Ia ■ Pr{Ai) = 0). So Pr(U A ,- 6 / -*Ai) = 1. Since Pr{Vj) > 0 for each j £ V,, it follows that 
Pr(U A -'Ai A Vj) > 0 as well, so the corresponding propositional sentence must be consistent. Also, 
Pr(Aj) > 0 for j £ I A — I, so since D is consistent Pr(Aj A Bj) > 0 and Pr(U A ->A t - A A j A Bj) > 0, so 
the corresponding propositional sentence must be consistent. A side effect of this is that I m »« Q I, therefore 
the above satisfiability conditions holding for I also hold for Imin , ss required for the theorem. □ 

Proof of Theorem 4.7 part 2 First prove the only if part of the theorem. So assume C B is a 
consequence of D for some 6. It is sufficient to prove that if D is consistent and U A ->A, ACA ->B is 
satisfiable, then there exists an ordered subset of the indices in I a- I min, sucil formulas (5) 

and (6) are true. Do this by contradiction. Suppose there does not exist such an ordered set of indices. Then 
there exists an ordered subset of the indices in I a - Imin, I = * 2 » • • • , 8UC ^ fc ^ at formulas (5) are true, 

but formula (6) fails and there does not exist an index »&+ 1 such that formula (5) applies for that index. 
Note that this occurs only if 


U A»€/mi« ‘'Ai A A i A B i a «€/ “ ,i *« A A ■ B ) 
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is satisfiable for every j € I a ~ Imin — 7 1 

U A iei min -'Ai A ,-g/ -«A,- A C A -’B 

is satisfiable. Call the set of 1 I a -Imin — 7j truth assignments satisfying the first form above Ti, and the truth 
assignment satisfying the second form X 2 . By the definition of I min , we have that U A.6/»<« 7^' A 4* A “ 
satisfiable for j € 7. Call the set of |7| truth assignments satisfying this T3. Finally, since D is consistent, we 
ak n have that U A ~‘Ai A Vj is satisfiable for j € 7v. Cadi the set of |Jv| truth assignments satisfying 
this T4. Now for 17 vanishingly smadl, consider the probability distribution Pr that makes truth assignments 

in T4 have probability those in T3 have probability , the one in T 2 have probability r){ 1 — J?), 

those in T\ have probability ^d any other truth assignment have probability 0. This makes 

Pr(Bj\Aj) > for j% lT~ Imin ~ 7, Pr(->B\C) > 1 - * etc. Clearly, Pr with a suitable value 

of t; can be used to* demonstrate D A -.(C a*/ B) is consistent for any / < 1. So we have proven the 
contradiction. 

Next prove the if part of the theorem. Clearly, consequence holds if D is inconsistent. So assume it 
is consistent, and assume without loss of generality that ij — j for notational convenience. Consider any 
probability distribution Pr such that 

{=Pr ^17 A oVi A i£l A Ai • 

So for each j € I a Imin 1 

Pr(Aj A Bj)> I — Pr{Aj A ->Bj) , ( 18 ) 

1 — el- 
even if Pr(Aj) — 0. From part 1 of the theorem we also know that Pr(U Aj ^j mim “ 1 Ai ) = 1, so this term can 
be effectively ignored in probability statements that follow. If h = 0, then from formula (6) it follows that 
p r (C -> B)=l, which implies Pr(B | C) - 1, so the if part holds. Otherwise, h > 1. From formulas (5) 

and (6), we have that 

[= U A i€/ w <« ““'Ai A k<h (A* “* Bk) -* (C — ♦ B) , 

^Pr(A fc A-iBjt) > Pr(V k < h A k A-'Bk) > Pr(C A -»B) . (19) 

k<h 

Also, there must exist sets P, C {1, . . ., j - 1} such that formulas (5) can be replaced by 

[= U A “’’•A. A Aj A Bj A k€Pj -* (C A B) , 

for j = 1, ... , h. Then 

^ Pr(Afe) > Pr(Vfc 6Pi A fc ) > Pr(Aj A Bj A ->(C A B)) . (20) 

k£Pj 

These inequalities are strung together below to produce the desired result. 

For j € 1, . • • , h, define 

(3j = min e k , 

1 kePj 



( 21 ) 


We shall prove by induction on j that 

h " h h 

'f2 j r»PHA l .AS t ACAB) + '£ i T Y Pt(a,ab,) >Y p ^ Ath ^ Bl) - 

k=j Pi I : l£Pk,Kj k =J 

Assume it is true for j + 1. Consider the case for j. Notice that by formula (18) 

lZliPr(Aj A ft) + TT 13 Pr ( A * A B ») 

e i p > i ePj 

> Pr(Aj A -•ft) + “r 53 e * Pr ( A ') 

Pj iePj 

> Pr(Aj A ->ft) + 7 j 53 Pr W * 

itPj 

Adding this to the inequality for the induction hypothesis, formula (21) with j + 1, we get that 

h : * '** ~ h 

1 i PT(A i AB i )+ nPr(A t ^kA22%B&Y%: T, Pr ( j4,AB ') 

»=i p ‘ meftW 

h 

> 53 Pr (^* A + t# X) Pr W • 

fc=; »€/>; 


So 

& h 

53 7 fcPr(Afc A Bi A C A B) -f Ip ^3 Pr (‘^ 1 A Pf ) 

le—j k=j * l ’ l€Pk,l<j 

> Y Prt - Ai A-B k ) + v f Y Pr{A,) - Pr{Ai A B i A " (C A B)) j • 

By formula (20), the induction step is proven. Notice that this same argument works for the base case of 
the induction proof, if we start at j = h using 0 > 0, so the induction proof is complete. 

Finally, for j - 1 in formula (21), we have that 


53 7 kPr(A k A B fc A C A B) > 53 Pr ( A * A “* Bk ) * 


By formula (19), it follows that 



Pr(CAB) > Pr(CA-B). 


So 


Pr{B\C) > 


1 + Ei=i T* 


The right-hand side of this inequality gives the error propagation function f for this consequence. This 
can be evaluated using the definitions of 7 y, ft and Pj given previously. Clearly, the smallest this error 
propagation function can be is when Pj = { 1 , . . j - 1} and ft = d = « for j < h. In this case, a simple 
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induction proof shows that 


= 

S ummin g these over j and simplifying gives 


1 -e 


K) 


h-j 


5 ~ - _eUl + M* " G + «) 


e+(l-e)(l + i) 

In contrast, if Pj = 0, and Pj — e, then / > □ 


Proof of Theorem 4.7 parts 3 and 4 The same as for Theorem 4.6 parts 3 and 4. Notice, also, that 
Im*n remains the same if a possibility is added to D. □ 


Proof of Corollary 4.7.1 The if part of the corollary follows from Theorem 4.7 part 2. Notice that if 
Pj = 0 for each jr , then the error propagation function.developed in the proof of Theorem 4.7 part 2 becomes 

i + ELi 1 ?? 1 ’ 

This behaves linearly for small e*. If any Pj ^ 0, however, this linear behaviour no longer exists. □ 

Proof of Corollary 4.7.2 The repeat loop in Step 2 simply performs the construction described in the 
proof of Theorem 4.7 part 1. This iteratively builds up I m in- The repeat loop terminates when for all 
j £ j A — j, U A,gj ->Ai A Aj A Bj is satisfiable. Otherwise, the algorithm is a direct implementation of 
Theorem 4.7 part 1. □ 


Proof of Corollary 4.7.3 The algorithm builds the ordered set of indices in turn. Clearly, if it fails at 
step 3(c), then no such ordered set can exist. □ 
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Figure 1: Quantitative measures of beliefs in A 
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Figure 2: Qualitative measures of beliefs in A 


T1 

OA (when (= A) 

-1 o A (when ^ ~»A) 

T2 

□(A — ► B) — {OA — OB) 

(-io A A oj B) — * o(iA A 5) 

T3 

{DA A OB) «-» O(AAB) 

(oA V o B) ~ o(A V B) 

T4 

OA — * o A 

*4 

o 

T 

□ 


Table 1: Theorem schemata (and duals) on and “o” 


T5 

OA «-» Po A 

o A p-o A 

T6 

□(A — * B) (p« A — ♦ B) 

□(A —* B) -+ (p>-e A — » F^ e B) 

T7 

-’(l => t A A ->A) 

(p the A V F>- e lA) 

T8 

oA A (A P, B) — °P 

(oA A OB) -* A 9Hb e B 


Table 2: Theorem schemata (and duals) relating U P” and “ft*-” to and “o” 


T9 

A P e A 

A P- c A 

T10 

A p« B — | p f (A — B) 

p- e (AAB) — A s$- e B 

Til 

(p« A A F><B) -* (A A B) 

p- e +d (A V B) — (F^e AVp-jB) 

T12 

p« A — ♦ (A P$ B — » P*+* B) 

p e A -* (FJ>“d B — » A B) 

T13 

p< A — (p* B —* A => t +6 B) 

p« A — ♦ (A P-d B — » « B) 

T14 

A — » (p e B -* A =>- 2 i B) 

pj»- e A — »• (Ap-dB — » B) 

T15 

(A =>, C) A (B =*•< C) - (4 V B) =*■,+« C 

(AVB)«- I+( |C — (A P- e C V B P-d C) 

T16 

F»- e (A A B) — ► (-<A P« ~*B —* B P^ A) 

p- e (A A B) — » (B P-d “>A — * -'A P- ja B) 

T17 

fAvB => «« , 0) — * (A => e CvB^< C) 

(A P- e C) A (B P-d C) -* A V B P- ^3 0 


Table 3: Theorem schemata (and duals) on “p” and P- 
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T6' 

□(CAA 

-B) - (C=>, 

A — ♦ C =>e 

5) 

T7' 

(C=> 

e A) A (C =>> “»A) — ► Q-*C 


Til' 

(C=>e A) 

A (C =>< 2?) — ♦ 

C =►«+* (A - 

r^B) 

T12' 

C => e A 

— * (C A A =>$ B 

C «+ ^ 

B) 

T13' 

C=> t A 

— ► (C =>$ B —* 

CAA =►«+$ 

B ) 

T14' 

C »*>- e A 

- (C =*< B - 

C A A =>ii 

e 

B) 


Table 4: Conditioned theorem schemata 


I 


Input: A QDp sentence 

□17 A,gj v o Vi A i£i A Ai => Bi . 

Output: The consistency or inconsistency of the sentence. 

Algorithm: Construct I m ax then check satisfiability. 

1. Let I = Ia- 

2. Repeat, 

(a) Find a j € I such that U A Aj A ,-g/ (A,- — ♦ Bi) is satisfiable. 

(b) If a j found, I = I — {j}. 

Until no j found or I = 0. 

3. If for some j € Iv, U A V, A i€ i -•A; is unsatisfiable, return inconsistent 

4. Return consistent. 


Figure 3: The defaults-consistency algorithm 
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Input: A QDp sentence 

OU A ,-gj v oVi A jg/ x A,- 2?,- . 

Output: The consistency or inconsistency of the sentence. 

Algorithm: First construct I TO ,- n , then check the possibility conditions. 

1. Let 1 = 0. 

2. Repeat, 

(a) Find some j E Ia ~~ I such that U A ig/ -'A,- A A^ A Bj is unsatisfiable. 

(b) If some j found and oAj is in the possibilities in the input sentence, then 
return inconsistent. 

(c) Else, if some j found, I — I U {j}. 

Until no j found. 

3. I is now equal to I m in- If for some j € Iv,U A.ej-'A,- A Vj is unsatisfiable, return 
inconsistent. 

4. Else return consistent 


Figure 4: The likelihood-consistency algorithm 
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Input: A likelihood C (V- m D, a consistent QDp sentence 

□ U A ig/v oVj A i£l A Ai Bi , 

and its index set I m in- 

Output: Whether the likelihood is a consequence of the sentence for some value of m. 

Algorithm: Build up the ordered subset of I a iteratively. 

1. If 17 A, e j mjm ->A, A C A ~>B is unsatisfiable, return is a consequence for any m. 

2. Set I = 0. 

3. Repeat, 

(a) Find some j € Ia — Imin — I such that 

|= U A »ei min -'Ai A Aj A Bj A »ej (A,- — ♦ Bi) — ► CAB . 

(b) If some j found, I = I U {j}. 

(c) Else return not a consequence. 

Until U A,gj mjm -'A,- AC A ->B A , € j A,- is unsatisfiable ox I = Ia Imin • 

4. If the loop terminated only because I = Ia — Imin > return not a consequence. 

5. Else return is a consequence for some m. 


Figure 5: The likelihood-consequence algorithm 



Figure 6: Dependency network for “can Joe read and write?” 
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